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The  computerized  data,  system  will  have  the  same  impact  upon  society 
that  the  printing  press  had,  and  learned  people  are  reacting  as  learned 
people  reacted  when  printing  was  developed.  They  are  arguing  that  the 
new  technology  will  enslave  the  common  man  and  shoula  se  suppressea.  In 
a  sense  both  are  right.  The  only  possible  defense  against  the  printing 
press  is  to  learn  to  read.  We  succeeded  that  time.  Print  lonts  and 
spelling  were  standardized  sc  everyone  could  learn  to  read,  ana  then  we 
insisted  that  everyone  learn  to  read.  New  v;e  have  to  stanaa raize  the 
interface  to  computerized  data  systems  ana  teach  everyone  to  use  them. 

A  person  who  is  illiterate  in  computerized  data  hanuling  in  the  future 
world  will  be  as  defenseless  as  the  person  who  cannot  reac  is 
defenseless  now. 

Andrei  Ershov*  Soviet  Acaaemician 
paraphrased  from  address.  Spring 
Joint  Computer  Conference,  i^67 


This  paper  is  about  data  in  our  society.  It  first  discusses 
values,  how  people  feel  about  using  data.  Then  it  discusses  how  man  has 
used  the  technology  that  is  available  as  tools  in  handling  data,  and  how 
data  handling  has  changed  as  technology  changes.  Information  resource 
management  is  now  changing  very  rapidly  as  new  technology  oecomes 
available . 

The  paper  argues  that  the  complex  programs  we  call  data  base 
management  systems  (BBKS)  are  an  artifact  of  monolithic  computers  with 
hierarchies  of  secondary  storage  managed  by  complex  general  purpose 
operating  systems.  The  next  step  will  take  the  DBI*iS  apart,  distributing 
the  data  management  functions  to  provide  the  same  capabilities  wit*;  less 
constraints . 

The  paper  concludes  with  a  discussion  of  theory  regarding  computer- 
based  data  management.  It  covers  the  intellectual  ferment  following  tue 
introduction  of  random  access  secondary  storage,  the  development  02 
CODASYL,  and  the  relational  model.  It  briefly  discusses  the  familiar 
arguments  regarding  the  need  to  separate  logical  cata  definition  1  rom 
the  physical  location  of  data,  stressing  different  representational 
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requirements.  The  paper  concludes  with  the  author’s  view  of  what  is 
needed  to  form  an  adequate  foundation  for  information  resource 
management  system  development. 


VALUES  PAST 

Yes,  Ha^s  Olderwlck  was  my  brother,  and  so  was  Andrew  Olafson.  'we 
didn't  all  use  the  same  last  name  in  Norway.  If  a  family  ail  used  the 
same  last  name,  the  government  could  keep  records,  anu  it  would  use 
those  records  against  the  family.  It  would  draft  men  for  the  array.  It 
would  use  them  to  find  out  how  much  tax  the  family  could  afford. 

Conversation  with  my  grandfather,  Ivor  Olson 
circa  19  h  5 

Traditionally,  most  cultures  had  a  decided  cultural  bias  towaro 
limiting  data  collection,  based  upon  an  experienced  distrust  of  people 
who  wanted  data.  Much  cf  the  opposition  to  the  introduction  of  social 
security  In  this  country  was  based  upon  the  argument  that  it  would 
enable  the  government  to  keep  accurate  records.  My  father  carefully 
asked  the  lumberjacks  who  worked  for  him  what  name  they  wanted  them  to 
give  social  security.  Ke  didn't  ask  their  name  since  that  was  agreed  to 
be  none  of  his  business. 

Even  how-to  information  was  controlled  and  access  to  information 
required  membership  in  organizations  owning  the  information.  Crafts 
from  ironworkers  to  miners  to  bridgebuilders  were  passed  on  to  Initiates 
and  others  were  carefully  excluded.  Pythagoras  set  the  penalty  for 
teaching  his  geometry  to  unauthorized  people  at  death,  and  rigorously 
enforced  it. 

The  development  of  the  modern  world  was  based  in  large  part  upon 
learning  how  to  obtain  and  use  Information.  Trade  secrets  were 
captured,  codified  anc  taught.  Patent  systems  were  introduced  to 
encourage  inventors  to  share  the  secret  details  of  their  inventions. 
Formal  procedures  for  managing  data  were  developed.  Tl.e  Horary  ana  tne 
filing  cabinet  were  augmented  by  the  computerized  repository,  with 
facts  available  to  support  theorizing,  technology  exploueu.  The 
scientific  method  formalized  a  rcle  for  experimentation*  collecting  ana 
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retaining  data  to  prove  or  disprove  hypotheses.  And  as  we  aid  these 
things,  our  attitudes  changed. 

We  are  in  agreement  that  how-to  knowledge  should  be  formalized  and 
taught,  including  agreement  that  access  to  teaching  should  not  be 
arbitrarily  limited . 

Our  attitudes  towards  data  are  in  a  state  of  flux*  as  witness  the 
same  congress  passing  one  bill  guaranteeing  privacy  and  another  access 
to  government  Information  without  serious  debate  on  the  unavoiaaoie 
conflict  between  these  two  goals. 

We  are  developing  an  appreciation  for  data  as  a'  major  property 
right  of  organizations. 

We  appear  to  have  reached  a  consensus  that  capturing  data  is 
legitimate  unless  other  laws  must  be  violated  to  capture  it,  wit n  the 
resulting  data  belonging  to  the  people  who  capture  it.  We  are  also 
reaching  a  consensus  that  if  access  to  data  is  offered  to  anyone,  then 
objective  standards  for  access  to  data  should  be  developed  and  enforced. 
In  other  words,  that  the  standard  commercial  laws  apply  to  data.  hore 
and  more,  access  to  collections  of  data  are  being  offered  in  return  for 
money. 

VALUES  PROBLEMS 

I  ask  you  to  really  believe  that  the  computer  was  developed  tnirty 
years  earlier  than  it  v/as^  and  the  state  of  the  computer  art  in  1936  was 
what  it  is  now.  Mow — remember  what  Hitler  did  to  the  Jews?  Without  a 
computer,  could  he  have  done  it? 

Frofessor  Joe  Weisenoaum,  MIX 
personal  conve r sat ion ,  Aug  1966 

Almost  fifty  years  after  his  demise,  itfs  hard  to  remember  that 
Hitler  didn’t  have  computers.  Using  only  manual  records,  he  defines  the 
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term  'Jewish  ancestry,  defined  a  Jew  as  a  person  with  one  fourth  Jewisn 
ancestry,  and  then  he  found  over  95%  of  them,  some  of  whom  didn't  Know 
their  ancestry*  This  required  a  massive  effort  involving  major 
resources*  We  would  be  wise  to  remember  that  if  there  is  a  next  time, 
finding  people  who  meet  any  arbitrary  definition  will  be  easy. 
Computer-based  data  will  be  available  somewhere. 

A  computer  has  nc  ethics.  It  is  a  tool,  doing  what  it  is 
programmed  to  do.  There  is  literally  nothing  a  computer  can  ac  tnat 
determined  people  willing  to  pay  the  price  cannot  do  without  it,  with 
the  possible  exception  of  real-time  control.  What  the  computer  aoes  is 
change  the  cost  and  difficulty  of  doing  things,  often  by  many  orders  of 
magnitude . 

Much  of  the  advantage,  or  disadvantage,  from  a  computer-based 
information  system  is  due  to  the  analysis  that  preceeuea  it.  The  most 
effective  approach  is  to  stop  just  before  installation  of  the  computer, 
and  consider  what  you  really  want  to  do. 

V/e  have  the  ability  to  provide  access  to  accurate  data  to  many 
people  under  many  conditions.  In  using  this  ability,  we  have  learned 
that  many  traditional  procedures  were  wisely  designed  to  have  data,  ana 
limit  access  to  that  data  by  making  access  difficult. 

An  example  Is  that  the  full  personnel  record  of  each  employee  under 
his/her  supervision  Is  available  to  each  supervisor  in  the  US  federal 
service.  The  supervisor  doesnft  have  to  state  a  reason  lor  wanting  to 
review  the  file.  However,  the  supervisor  must  make  an  appointment  ana 
go  to  another  office  usually  located  in  another  building  to  read  the 
file.  When  computers  and  communication  equipment  enabled  access  to 
personnel  files  maintained  by  a  personnel  office  be  provided  in  the 
supervisor's  office,  some  organizations  provided  that  access.  They 
removed  It  because  employees  felt  the  supervisors  were  misusing  it 
snooping  Into  their  personal  affairs  without  sufficient  cause. 


The  world  we  live  In  has  fact  availauility  far  oeyond  our 
understanding  of  the  interrelationships  between  facts  ana  the  meanings 
that  can  be  inferred  from  them.  We  have  changed  from  a  worla  wnere  new 
information  was  scarce  to  one  so  inundated  in  data  that  almost  any 
concievable  information  can  be  extracted  from  data  that  is  routinely 
captured.  We  haven't  yet  learned  how  to  use  data  or  to  limit  misuse  of 
data. 


We  haven't  learned  how  to  limit  access  to  data,  bui 
developing  a  consensus  that  data  in  computerized 
somebody  who  is  responsable  for  maintaining  integrity  c 
owner  of  the  data  is  also  responsible  for  authorizing 
The  usual  authorization  for  access  involves  paying  money 
is  used  to  maintain  both  the  data  and  the  organization  t 


are  rapidly 
rm  oelongs  to 
he  oat a.  The 
cess  to  data. 
,  and  that  money 
hat  owns  xt . 


The  most  grandiose  data  availability  scheme  is  Euronet(l;,  where 
the  European  telephone  companies  offer  a  service.  Anyone  who  wants  to 
sell  access  to  computerized  information  can  enter  the  net.  Tney  just 
have  to  provide  a  standard  gateway.  Anyone  with  appropriate  equipment 
who  wants  to  use  it  can  do  so  through  the  phone  lines.  nuronet  will 
hill  the  user  and  pay  the  providers.  In  the  United  States,  various  aata 
services  are  available.  However,  the  user  must  usually  sign  up  to  the 
individual  data  service,  ana  receives  individual  bixls  for  usage. 

We  haven't  yet  learned  hew  to  exploit  the  data  we  nave.  Above  a±l 
we  don't  know  hew  to  recogrize  and  correlate  data  that  are  in  iact 
different  aspects  of  the  same  subject.  This  might  be  caileu  tne  'olind 
man  and  elephant'  problem.  Modern  analysis  is  agreeu  to  oe  concept 
driven.  Technology  tc  gc  from  undigested  facts  to  knowledge  ox  interest 
demands  a  model  relating  the  facts  tc  the  knowledge.  Moael  ouiluing  is 
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.lust  as  difficult  as  it  has  always  teen.  However,  model  validation  is 
much  easier.  We  have  data  now,  so  we  can  check,  the  validity  of  our 
models  if  we  choose  to  do  so. 

To  get  the  facts  first  is  impossible.  There  are  no  f acts  unless 
one  has  a  criterion  of  relevance.  Events  by  tnemseives  are  not 
facts .... 

The  only  rigorous  method,  the  only  one  that  enables  us  to  test  an 
opinion  against  reality,  is  based  on  the  clear  recognition  tnat  opinions 
come  first — and  that  is  the  way  it  should  be.  Then  no  one  can  fail  to 
see  that  vie  start  out  with  untested  hypotheses  —  in  decision  making  as  in 
science  the  only  starting  point.  We  know  wnat  to  uo  witn 
hypotheses — one  does  not  argue  with  them*  one  tests  them. 

Peter  Trucker,  The  effective  Executive  1^67 

We  now  have  the  data  and  processing  power  to  validate  individual 
models,  and  are  finding  cut  that  the  traditional  concepts  feveryoouy 
knew 1  are  either  gross  over-simplifications,  or  else  simply  wrong. 

For  example,  the  people  who  built  the  first  computer-aided  language 
translators  found  that  the  grammar  they  had  been  taught  was  sot n 
Inadequate  and  wrong.  As  an  instance,  English  adjectives  go  have  oraer, 
and  changing  order  changes  meaning. 


Wrong 

concepts  lead  tc 

wrong  de 

cisions . 

Traditionally  this  ci 

on  '  t 

matter  much 

,  since  decisions 

had  so 

little 

impact.  The  impact 

01 

decisions 

depends  upon  the 

ability 

to  carry 

them  out.  The  oetter 

tne 

technology  available  to  carry  out  decisions,  the  more  destructive  wrong 
concepts  can  be.  Cur  technology  is  now  very  powerful.  We  urgently  neea 
much  better  tools  to  adapt  concepts  to  conform  to  the  evidence  at  hand. 

If  ’witness  is  needed,  the  behavior  of  economic  systems  managed  oy 
Keynsian  or  Marxist  models  over-proves  the  point.  Nations  nave 
literally  destroyed  their  economies  by  actions  taken  in  accordance  with 
o re  cf  these  models.  Yet  each  of  these  models  absolutely  demands  both  a 
static  world  and  a  ‘world  where  information  is  free  and  always  available. 
The  decision  makers  who  have  ruined  their  own  fortunes  as  well  as  tjceir 
nation  must  have  known  the  world  isn’t  static  and  struggled  to  attain 
the  information  they  needed  tc  do  their  jobs.  Yet  they  took  ii.e  actions 
recommended  by  the  model  they  lived  by,  and  then  constructed  ingenuous 
explanation?  after  each  failure  cf  their  system  to  behave  as  their  moaei 
cre^eted.  And  continued  to  use  the  same  model. 
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We  urgently  need  to  develop  the  judgement  to  both  question  ana  use 
computer-based  predictions,  "The  computer  says"  is  sometimes  usee  as  a 
final  argument,  and  that  is  absurd.  The  implications  benina  ' the 
computer  says'  are  that  some  model  lias  been  programmed  into  tne 
computer,  data  has  been  fed  into  the  model,  and  the  computer  nus  'maae ' 
a  prediction.  The  prediction  is  as  good  as  the  model  cenina  it,  which 
car.  be  very  good  indeed  if  the  model  has  been  valiuatea,  anc  tne 
assumptions  upon  which  it  is  based  have  not  been  changed.  It  can  also 
be  nonsense. 

Sometimes  the  'model'  hasn't  been  stated,  let  alone  validated  in  a 
defined  domain.  One  of  the  successful  techniques  of  Artificial 
Intelligence  is  to  model  the  behavior  of  selected  experts,  to  develop 
v,rhat  is  called  an  expert  system.  Properly  constructed  anc  used,  expert 
systems  can  previce  a  useful  aid  to  decision  making.  However,  tnere  is 
much  danger  in  using  the  system  indiscriminately  without  either  a  aounu 
knowledge  base  or  the  Judgement  of  the  expert  being  modeileu.  be  face  a 
real  danger  in  accepting  computer  precisions  ana  recommenuations 
without  adequate  consideration  cf  the  limitations  inherent  in  tne 
technology  upon  which  they  are  based. 

As  v:e  move  Into  the  information  age,  we  neec  niethouolcgy  to  manage 
concepts  in  the  plural,  to  trac^  the  multitudinous  possible 
Interpretations  of  data  and  to  continually  ask  the  qustion  ,  'What  wouid 
disprove  this  concept?'.  We  need  methodology  tnat  we  iruso  enougn  tnat 
v/her.  a  concept  Is  proven  incorrect  we  accept  the  proof  ana  use  it  to 
develop  better  concepts.  The  people  who  use  information  weii  will  nave 
the  same  transient  advantage  the  Lcnuon  bankers  nau  when  tney  knew  of 
Waterloo  the  day  before  their  competitors.  I heir  colleagues  w no  use 
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facts  to  validate  concepts  will  have  the  more  enduring  auvancages  the 
Japanese  have  shown  in  adjusting  their  economy  to  change. 

VALUES  FUTURE 

Information  is  the  name  for  the  content  of  what  is  excnangea  witn 
the  cuter  world  as  we  adjust  to  it,  and  make  our  adjustments  felt  upon 
it.  The  process  of  receiving  and  of  using  information  is  the  process  of 
our  adjusting  to  the  contingencies  of  the  outer  environment  ana  of  our 
living  effectively  within  that  environment . 

It  becomes  plausille  that  information  ...  belongs  among  tne  great 
concerts  of  science  such  as  matter,  energy  anu  electric  charge.  Our 
adjustment  to  the  world  around  us  depends  upon  tne  informational  winoows 
that  our  senses  provide. 

horbert  Seiner 
Pi.ilosopky  of  Science,  lb*o 

The  ability  to  validate  concepts  won’t  be  enough  for  the  worla  ti*at 
is  on  the  horizon. 

As  we  move  into  the  information  age,  we  need  methouoiogy  to  manage 
concepts  in  the  plural,  to  track  the  multitudinous  possible 
interpretations  of  data  on  the  one  hand,  and  to  continually  as*>  the 
contradictory  question,  ’what  would  disprove  this  concept ’  on  the  otner. 

We  need  methodology  that  we  trust  enough  that  when  a  concept  is 
proven  incorrect  we  accept  the  proof.  Then  we  need  methouoiogy  to  deal 
with  the  disproven  concepts.  Sometimes  we  will  be  able  to  use  the  proof 
that  our  concept  is  incorrect  to  develop  better  concepts.  Sometimes  we 
will  only  know  that  what  we  thought  was  correct  is  wrong,  witnout  enough 
knowledge  to  identify  what  is  right.  This  is  the  most  uifficuit  01  ail 
intellectual  situations.  Uc  have  to  be  able  to  accept  the  fact  tnat  we 
don’t  know.  be  must  also  find  a  way  to  describe  what  is  happening  well 
enough  to  bring  trains  tc  bear  on  tne  newly  identified  missing  concept. 

The  import  an4"  point  is  that  \:e  must  be  to  able  apply  the  scientific 
method  to  the  concepts  from  which  we  infer  moaning  from  uata.  However, 
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technology  Is  of  no  value  unless  it  is  used.  We  need  to  do  what  we 
know.  When  a  theory  is  proven  or  disprove/!,  anu  our  previous  oeiieis 
are  shewn  to  be  in  error,  v;e  must  develop  the  cultural  capability  to 
accept  the  fact.  Even  when  we  don't  have  another  theory  to  replace  the 
one  proven  tr  be  in  error. 
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TECHNOLOGY  PACT 


In  the  design  of  data  base  management  facilities,  particularly  cue 
self-contained  functions  and  the  host  .  language  data  manipulation 
facilities,  there  is  always  some  underlying  philosopi-y  wiJ.cn  governs  the 
compromises  made  In  both  design  arid  implementation  of  the  system,  ana 
therefore  provides  a  basis  for  understanding  the  system. 

Feature  Analysis  of  Generalized  Database 
Management  Systems,  CCDASYL  Systems 
Committee  Technical  Deport  May  1971 

The  earliest  machines  that  could  reasonably  be  calleu  digital 
computers  were  designed  to  support  census  data  analysis.  Data  engines 
preceeded  arithmetic  engines.  However,  in  the  beginning,  data  was 
managed  for  a  known  set  of  reasons,  with  one  specific  viewpoint.  In 
fact,  most  data  structures  were  embedded  into  procedural  languages  such 
as  COBOL  and  PL/1. 

We  often  forget  the  impact  of  technology.  As  long  as  tape  was  the 
only  available  secondary  storage,  data  aggregates  of  any  size  could  only 
be  accessed  serially  and  computerized  data  management  was  management  of 
files.  Consequently,  additional  uses  for  data  involved  constructing  new 
files  to  support  access  to  the  data.  The  new  capabilities  available 
when  secondary  storage  was  developed  that  would  support  random  access  to 
data  provided  a  great  increase  In  capability,  coupleu  with  enormous 
additions  1  complexity . 

The  DBMS  approach  Is  an  outgrowth  of  recognizing  that  aata 
collected  for  one  purpose  could  se  used  for  many  other  purposes  as  well, 
and  should  be  collected  and  managed  as  a  resource  of  the  enterprise  as  a 
whole.  This  significant  insight  coupled  with  tne  uevelopment  of 
hierarchies  of  secondary  storage  ieo  to  development  of  complex  programs 
to  support  management  of  the  enterprise  database. 
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A  number  of  commercial  software  packages  have  been  aeve; opeu  to 
perform  data  management.  Cf  course,  these  DcMS  assume  the  computing 
environment  in  which  they  were  developed.  That  is  an  essentially  Von 
Newnanr  machine  with  an  operating  system  to  manage  the  resources  of  tne 
system  including  all  storage  and  the  necessity  to  pruviue  all  otner  aata 
usage  support  inside  the  DEIiS. 

The  data  base  management  system  as  an  integrated  pacKage  of 
software  enabled  the  computer  to  be  used  to  manage  aata  as  a  resource. 
This  was  a  step  increase  in  capability  to  handle  data,  paia  for  witn 
complex  software,  and  the  necessity  to  define  an  overall  worlaview 
(schema).  The  DBXS  approach  assumes  a  data  administration  function, 
where  the  schema  and  subschema  (views)  are  uefined.  ’’They  assume  tne 
data  administrator  has  all  the  smarts,  and  the  rest  of  the  users  are  all 
chimpanzees"  to  quote  a  disgruntled  colleague. (2)  While  this  colorful 
phrase  overstates  for  emphasis,  it  makes  a  pertinent  point.  Tne  aata 
administrator  is  responsible  for  maintenance  of  the  enterprise  woria 
view,  as  well  as  the  consistency  and  structure  of  tne  oata  unaer 
administration . 


In  implementing  a  particular  catu  management  system,  an  explicit 
data  model  is  desirable,  and  separate  semantic  anu  syntactic  moueis  of 
proposed  DBT'S  implementations  usually  lead  to  simpler  logic  anu  more 
understanding.  Interactions  between  the  two  moods  can  be  readily 
documented,  leading  to  explicit  documentation  of  uotn 
represents i on -independent  problem  statements  anu  tne  implementation 
deoisiors.  Unfortunately,  currently  available  cata  mouexs  cannot  ueal 
with  nom  than  a  single  overall  perspective.  To  go  farther  whi  require 
ad vp me s  in  knowledge  representation. 
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Data  models  are  usually  referred  to  as  Liei arcnicui,  wx.ere  cata 
items  car  have  only  one  'root*,  network  where  data  items  can  have  two  or 
more  1 roots*  and  relational  where  data  ii  represented  as  relations. 
Both  network  and  hierarchical  data  models  are  baseu  nuituematicaily  upon 
set  theory  and  the  relational  model  is  based  upon  first  orcer  predicate 
calculus.  These  models  have  been  proven  in  one-to-one  correspondence 
with  ore  another  in  the  sense  that  each  can  carry  the  same 
information, c.f . (3)  and  the  particular  model  to  ba  used  in  a  particular 
application  is  primarily  a  matter  of  preference.  however,  tne 
appropriate  model  for  the  individual  application  and  individuals  using 
the  model  can  greatly  increase  unders tanding  of  the  underlying 
processes.  The  hierarchical  model  was  developed  first,  the  network 
second,  and  the  relational  last.  Each  had  a  decided  advantage  over  it’s 
predecessors  in  certain  applications.  Each  was  strongly  promoted  after 
it  was  developed.  In  fact,  the  relational  model  developed  a  following 
with  many  characteristics  of  a  cult,  and  recent  advances  in  data  models 
are  referred  to  as  post-relational.  Post-relational  models  combine  the 
characteristics  of  the  hierarchical  and  relational. 

As  is  usual  when  step  increases  are  achieved,  they  are  highly 
valued  for  a  time**  after  which  t hey  are  taken  for  granted  as  a  utility. 
The  proper  place  for  utilities  in  computer  science  is  in  the  operating 
system,  and  data  management  functions  are  beginning  to  migrate  to  tne 
operating  system.  The  first  such  function  was  management  of  input  and 
output,  to  the  computer. 

mhe  DP  MS  provided  the  capability  to  use  all  the  data  of  the 
enterprise  as  a  coherent  whole.  This  capability  to  use  data  inside  the 
DE^S  as  a  resource  leads  naturally  to  tne  cesire  to  comsine  that  uata 
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with  other  data  not  in  the  DBMS.  Most  modern  DBMS  packages  include  the 
capability  to  access  other  files  resident  in  the  same  machine  by 
declaring  the  file  to  the  DEMS.  However,  the  situation  oecomes  more 
complex  when  the  data  to  be  accessed  resides  in  another  DBMS, 
particularly  another  different  DBMS  hosted  by  a  different  machine. 

STANDARDS 

Perhaps  the  biggest  problem  in  the  industry  today  is  that  ... 
approaches  to  data  management  are  not  only  incompatible  with  each  other, 
but  also  are  often  data  incompatible.... 

Feature  Analysis  of  Generalized  Database 
Management  Systems,  CCDASYL  Systems  Committee 
Technical  Report  May  1971 

The  initial  response  to  the  desire  to  use  data  from  different  DBMS 
together  was  to  try  to  develop  standards.  In  the  simplest  case,  such  as 
the  Euronet  discussed  above,  where  data  is  simply  retrieved  from 
different  sources  and  moved  to  another  for  processing,  this  lias  wonted 
quite  effectively.  The  designers  of  Euronet  declared  an  interface  that 
a  DBMS  must  have  to  enter  the  net,  and  vendors  interested  in  selling 
data  through  Euronet  built  it.  In  fact,  most  of  them  changed  their  DBMS 
to  make  the  Euronet  Interface  a  standard  access  procedure  instead  of 
building  the  Interface.  The  Euronet  experience  suggests  that  many  users 
desiring  data  from  outside  their  enterprise  are  satisfied  by  a  network 
of  accessible  data  bases.  They  are  willing  to  identify  the  appropriate 
data,  retrieve  it,  and  move  it  into  their  own  DBMS  for  further 
processing. 

Standards  are  necessary  to  enable  things  from  different  sources  to 
work  together,  and  useful  standards  are  urgently  needed  in  aata 
management.  Formal  standards  activities  are  in  progress  to  facilitate 
the  development  and  exploitation  of  DBMS  functions.  The  AI.SI/SPAnC  DBMS 
model  has  36  components  and  44  interfaces  designed  to  explicate  the 
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functior.3  of  DPI’S,  and  the  standards  necessary  to  provide 
integration. ( U)  IMS  I  S tnndards  droup  X5HA,  Information  Resource 
Dictionary  Systems  is  the  first  formal  prcuuct  of  SPARC  activity.  It  is 
in  the  process  of  setting  interface  and  content  stariuarus.  ine  national 
Bureau  of  Standards  has  also  recently  published  specifications  for  a 
standard  data  dictionary  system. (5)  Data  dictionaries  are  being  offeree 
for  sale  with  defined  interfaces  to  commercial  DBMS,  where  the  aata 
dictionary  and  DEMS  run  together.  However  to  date,  interface  to  each 
DBMS  is  being  separately  constructed.  No  overall  standaru  is  yet 
implemented. 

A  relational  data  base  task  group  was  chartered  in  1975**  ana  nas 
published  a  feature  catalog  of  systems  which  is  a  precursor  to  st&nuaras 
definition.  This  will  at  least  resolve  a  great  ueal  of  confusion 
regarding  what  is,  and  what  is  not,  relat ional . ( 6) 

Attempts  at  more  detailed  standards,  particularly  attempts  to  stand¬ 
ardize  on  one  computer  and  one  DBMS  haven’t  been  very  successful.  Tnere 
are  three  primary  reasons.  First,  DBMS  and  data  structures  determine 
how  the  DBMS  behaves.  Forcing  several  functions  to  use  the  same  DBi'iS 
has  lead  to  unacceptable  operational  inefficiencies.  Second,  there  is 
always  the  data  that  wasn’t  thought  of  when  the-  overall  system  was 
designed,  and  that  is  new  resident  on  a  different  nest  with  a  different 
DBMS,  costing  a  fortune  in  time  and  money  to  change  it  but  needing  to  ue 
used  together.  Third  and  most  important,  data  heiu  in  iuenticai  DBrnS  on 
identical  computers  still  can’t  be  used  together  without  aauitionai 
software.  What  is  needed  for  a  single  DBMS  would  be  one  DBMS  managing 
data  in  several  computers  at  several  nodes.  This  is  stiil  a  research 
subject.  However,  practical  considerations  of  data  ownership  ana 
control  limit  the  applicability  of  the  single  DDMS-many  computers 


15 


Capaoixities 


approach  to  using  data  from  different  computers  together, 
to  use  DBF'S  together  are  required. 

THE  UNIVERSE  EXPANDS 
We  want  more. 


Sam  Ccmpers  at  various  times  I&ou-iy24 


The  first  idea  tc  facilitate  using  DBMS  together  was  to  implement 
the  query  language  of  one  on  the  ether.  Ari  Shoshani(7),  then  of  SDC, 
has  proven  this  is  in  general  impossible.  However,  Rooert  rranKei^oj, 
then  at  the  Whart on  School,  developed  an  interpretive  interface  called  a 
Functional  Query  Language  (FQL)  in  which  general  programs  can  ue 
written,  and  which  can  be  built  fcn  top  of’  any  DBmS.  Frarucei 
demonstrated  a  general  capability  to  put  a  general  rtopf  on  different 
DBMS  that  will  allow  one  program  to  run  against  all  of  them.  r wL  is 
used  extensively  in  building  software  for  commercial  sale. 

The  first  operational  system  designed  to  support  access  to 
disparate  heterogeneous  data  bases  is  the  ADAPT  capaoility  on  the  highly 
classified  COINS  network  at  the  National  Security  Agency.  However, 
adding  the  next  data  base  tc  ADAPT  as  originally  designee  oecomes 
progress! vely  more  difficult  as  the  number  of  DBMS  increases.  Acuing 
another  DBMS  to  FQL  does  not  introduce  additional  dif f iculties ,  and 
rebuilding  ADAPT  on  FQL  has  been  proposed  to  USA.  There  is  nc  puDj.isi.eu 
data  regarding  usage  of  ADAPT.  However,  ADAPT  runs  on  a  PDrll/  70 
attached  the  COINS  network  and  queue  buildup  is  not  excessive, 

leading  to  the  inf  ere"  ?e  that  queries  involving  two  or  more  CO  la  6  Ooi-iS 
are  comparatively  infrequent . (9) 


FQL  and  similar  systems 
bases  together  since  data 


ouiVi  g.  uue  p i  Guiem  ox  using  uata 

can  be  retrieve^  iiurn  several  net erogeneous 
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systems  using  the  same  acces?  capability.  However,  an  F«L  stin 

recuires  that  the  user  of  the  data  know  whicn  DEM3  *,oxus  d.e  oata  to  oe 
accessed.  L.S.  Schneider,  then  of  Martin  Marietta  corporation, 
designed  and  built  a  relational  query  compiler  tuac  couiu  access 
distributed  heterogeneous  data  bases  without  requiring  cue  user  to  .enow 
where  the  dat resided.  (10)  However,  this  compiler  requireu  extensive 
conrutat ions .  Additional  research  is  requireu  to  establish  how  to  build 
an  operational  system  with  the  capability  to  access  uistriouteu 
heterogeneous  data  bases  where  the  user  doesn't  need  to  know  where  tne 
data  is  located. 

The  difficult  problems  in  using*  data  from  different  DBFib  arise  wnen 
data  from  one  OEMS  is  used  to  specify  tne  cata  to  be  retrieved  from 
another  DBMS.  This  capability  would  certainly  be  useful.  For  example: 
'identify  the  poison  from  these  symptoms  auu  finu  out  where  the 
antidotes  can  be  obtained.  *  Symptoms  of  poisons  anu  location  of 
antidotes  are  almost  certainly  in  different  DZI1S.  However,  D5HB  can  be 
used  effectively  together  without  this  capability. 

PRACTICAL  ISSUES 


The  ultimate  objective  of  database  systems  is  to  make  application 
development  cheaper,  faster  and  more  flexible.  It  is  important  not  to 
lose  sight  of  this  overriding  objective  as  we  discuss  the  complex 
mechanics  of  database  systems. 

James  Martin 

Computer  Database  Organization,  1977 


Dees  It  do  v: hat  I  want  it  to  do? 

almost  all  users  everywhere 

In  practice,  getting  into  the  computer  in  r r.e  uesirec  format, 

vrith  suitable  accuracy  checks  so  applications  can  se  performed  aumus t 
?lwpyr  1 Q a s **  hr  If  the  operating  resources  requireu  to  operate 

DP"S.  — -rational  OEMS  with  good  reputations  were  uesigneu  to 


ng  tnc  economics  of 
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rapture  rerera  lly  have  received  surpri  singly  little  theoretical 
attention.  TM:  ic  unders tuneable ,  since  cats.  input  a:;-  iaca  culture 
use  different  technology  and  require  skills  end  orientation  outsiae 
computer  science,  !,Th?t  i  z  net  understandable  io  that  DIMS  are  almost 
invariably  evaluated  in  terns  of  the  ability  to  respond  tc  an  as  ^oo 
ouc**y.  The  *d  hoc  -y ..cry  in  particularly  inapposite  in  that  m^ny  Hi o 
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Experience  has  rise  established  that  dor  rrar.y  applications,  Uni 
technology  isn’t  losirerble.  DIMS  scftv.are  has  an  overhead  that  shOuia 
orly  he  pc  id.  when.  additional  usages  of  the  data  become  important  to  tne 
ort^r^rirc,  cir.^e  safcr.  kept  in  files  is  reach  more  efficiently  managed. 
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Pile?  c*r  still  to  accessed*  tc  enable  the  J.nta  tc  be  useu  for  other 
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accessed*  Otherwise  retrieving  the  data  in  that  order  .<111  outurate  tne 
n?MS ,  erevriirg  out  ether  a  sc  re .  Often,  a  DEV.Z  is  ujcb  co  facilitate 


rene??  tc  data  in  files,  or  even  outside  the 


r.puter  system, 


hJith  data  in  machine  readaLle  font  ready  to  enter  the  ay  stem  In 
flV^- increasing  quantities,  the  importance  of  designing  the  system  to 
capture  this  rau  data,  do  whatever  t ransf crr.at ions  ere  required  and 
er*Ar  the  data  including  integrity  checks,  indexing  and  building  other 
Oroe?s  mechanisms  in  time  synchronized  with  the  data  arrival  speed 
^ c  oriticn1  • 


T’-'o  overall  system  must  be  designed  tc-  operate  with  human 
i nt-pj-venticn  only  where  human  decision-making  is  required.  This  again 
involves  detail  understanding  of  the  enterprise.  The  enterprise  model 
is  invaluable  in  designing  the  data  system,  tc  act  in  accordance  wish  a 
T're,scri  bed  algorithm. 

Of  crurce,  sometimes  the  prescribed  algorithm,  has  unrecognized  sloe 
eff^-'ts .  It  is  now  almost  impossible  tc  buy  the  standard  unstyiisn 
<"'?cr<*  woman's  shoe.  This  style  remained  the  same  for  decades,  selling 
«nov.rw  to  Justify  stocking  it  tc  women  who  preferred  substance  to  style. 
mh°y  ?.r®  the  leash  expensive  product  for  a  shoe  manufacturer  tc  make, 
sf-co  th*y  dc  not  involve  changing  production  machinery  or  w  uniing 
instructions .  They  are  also  the  least  expensive  shoe  tc  s uock  since 
*-h<=y  ’’Hi  rot  gc  cut  of  style.  However,'  the  algorithm  that  was 
irt reduced  to  manage  production  of  women's  shoes-  and  appe.rer:tly  sole  to 
•’ll  -h"e  rar.u?rc*urorc -  assumed  that  sales  of  each,  style  would  peak  anu 
die  a+f.  I*-  is  important  to  stop  production  to  t  rc-vent  unsaiaoie 
nc-lonf-er-stylish  shoes.  Sc  a  'step  production  when  saie-o  no  longer 
ircr^ar  ° '  op^r  -•?'  built  into  the  production  manage  i.  Oxford  snoee 
Picr.al  s'- of  "reduction,  and  the  production  r-.anager  steps.  To  date,  the 
trs-ditic-pi  cmford  sources  have  not  chosen  tc  override  their 

r  rr  ~~o— 
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A  ~r~rt  c>erl  cf  theoretical  attention  has  tacn  pell  to  pi  eventing 
r-ohl-!rs  ir  ccr.cv.rrer.cy .  These  problems  arise  v:hen  one  person  or 
p~n^r»rr  1®  '-pd^ting  a  datum  or  sat  of  related  data  while  another  is 
us*~r  ?  +  t  or*  when  tv:c  or  more  sources  arc  updating  the-  came  data,  Inis 
is  solved  by  -iprepriate  locating  of  files  being  updateo*  nr  cy 
'bime-str.rpir.^  to  assure  t  rat  Jatf.  from  different  update  cycles  is  net 
us^d  irodvertort ly  used  together.  Actually,  this  problem  is  not  01  ter* 
serious  in  practice,  since  data  is  almost  always  updated  Ly  only  one 
source  and  little  damage  is  done  by  locking  the  domain  in  which  upuating 
is  taking  place. 

A  closely  related  problem  can  bo  critical  if  two  or  more  users 
expect  to  use  the  same  item.  The  military  example  is  where  two  cr  more 
commanders  e°ch  plan  tc  use  the  same  airplane  cr  shoot  the  same 

^rrsurJtirn.  In  practice,  the  resource  allocation  problem  is  usually 
avoided  by  administrative  action  of  the  organisation  using  the  data 
basr.  Th^  military  know  who  owns  the  ammunition  and  the  airplane,  ana 
i r^i  ,fi  dual  v:ho  doesn't  own  it  will  get  authorisation  from  one  w  no 
t^fere  planning  tc  use  it.  In  airline  ticket  sales,  the 

*r*r.s*cti  or  takes  place  when  the  ticket  is  actually  sciu?  v;hich  is 

defirpd  deleting  it  from  the  data  base.  The  ticket  buyer  hnev/s  that 

urtil  then  the  ticket  s/he  v:as  tcld  about  is  subject  to  sale. 


Another  related  problem  is  data  security.  It  is  aeccptc-u  zuaz  1:  a 
c*:ev„r  can  got  into  a  computer  system,  i*^y  **avc-  -ccess 

t"  ^v^rythirp  ir  the  system.  This  ;csr.’t  been  really  :*av  under  all 
ccr'-’ih  J  for  several  veers,  but  tho  p ruble...  of  how  tc  allow 

individuals  ra-ess  to  part  cf  an  integrate- ic-.ta  base  while  preventing 
other  parts  cf  the  same  data  bare  dees  not  have  an  agreed-upon 
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solution,  Ir.  practice,  starting  with  a  DEMS  arc  schema  definition  tnat 
current  Ver4, if i cs tier,  cf  data  in  terras  of  access  authorisation,  and 
then  r^rg  -  combination  of  limit ir.g  access  to  trusted  individuals  ana 
^■’’"iri^r^ive  sanctions  crevices  acceptable  security  in  most 
situations.  Theoretical  werh  has  focused  upon  building  prevabiy  secure 
sy^en.%  At  present  it  is  possible  to  prove  the  correctness  cf  oniy 
very  smal?  fr? rment 3  of  cede.  Certifiably  secure  data  systems  are 
expected  to  become  available  in  the  late  1980s. 

Sometimes  conventional  DBMS  and  analytic  approaches  lead  to 
unacceptable  time  or  resource  requirements .  This  is  caused  by  very 
complex  analytic  processes  and/or  very  large  data  aggregates.  The 
prototype  difficult  problem  is  unfolding  hills  in  geological  analysis. 
Several  interesting  problems  remain  unsolved.  However,  successful 
approaches  to  these  problems  have  been  based  upon  simulation,  notably  by 
Bougie  of  the  Pierre  and  Marie  Curie  Institute,  Paris  France. \12j 
Another  approach  which  Is  being  exploited  to  solve  the  very  difficult 
problems  is  through  artificial  intelligence  modeling  of  experts  ir.  the 
subject  area. 


As  mentioned  above,  the  primary  user  of  most  PD MS  is  software,  not 
direct  ruery  by  people.  The  requirements  for  effective  interface  to 
programs  are  different  than  these  for  interface  to  hurans .  In  either 
c-se  it  is  important  not  to  throw  away  knowledge  regarding  data 
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Snpp  theoreticians  argue  that  -e sources  are  rapidly  decreasing 


in 


prir-  ’-'Hie  programmers  and  users  are  becoming  mere  expens  1 
oue^tiy3  they  argue  that  the  D3?!S  design  should  net  consider 
except  in  terns  of  demand  upon  human  users.  The  problem 
a r purer t  is  that  in  practice  however  many  resources  have  seer* 
uses  for  these  resources  rapidly  exreedes  the  supply.  Upgrau 
sorter  is  still  a  major  undertaking.  In  consequence,  how 
Interacts  with  the  primary  intended  use  rust  ho  considered  i 
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is  tc  provide  effective  support. 


Computer  systems  generally  behave  better  if  designee  by  the  old 
Catholic  prircirle  of  subsidiarity*  never  do  anything  at  a  higher  level 
if  it  c?n  he  done  at  a  lower  level.  Have  the  computer  perform*  ^ata 
selection  ^  the  first  point  where  all  necessary  data  is  available  to 
minimize  data  handling.  Keep  support  close  to  v: hat  is  being  supporteo, 
do^’t  let  support  of  individual  devices  propagate  far  beyond  the  device 
being  supported.  Keep  user  support  close  tc  the  user,  printer  s^eoix’ic 
commands  close  to  the  printer,  and  disk  directories  or*  the  dis*.  itself. 
Developments  on  the  horizon  may  well  make  these  practices  much  easier. 
However,  experience  tc  data  suggests  that  neglect  of  these  practical 
issr.es  will  still  lead  to  unacceptable  system  performance . 
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Technological  changes  are  beginning 
op*'rorrth-?'r  t  o  implementing  t  he  cor  abilities  r.  c  *•  r  r  s  v  i , 
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„  f _ ,  .  C' 


c-chi.oice,^  are  conversing. 


Find,  r*j_omchi^s  he. ve  been  developed  t hot  olio*/  proctuouro  to  c e 
*'  1  o no^  *n  tue  disks,  printers  and  ct he r  peripherals  .  Ids  proviuc©  a 
c t" o ^  ^  n(*r*parg  pp  i  a  t  a  handling  c  a  p  a  b  i  ll  t  y  •  A  uata  ba.se  uac.^j*^  can  ue 

fruitfully  ?n*Q lysed  as  a  very  smart  dish.  Data  case  machines  are 
nrovi  ding  a  capability  to  manage  flat  files  very  eificiently.  -  i.es  e 
-in chines  have  elements  cf  parallelism,  and  several  such  machines  can  uc 
vred  together  as  back-end  to  data  management  systems.  Microchips  from 
which  complete  data  tase  machines  can  be  tailored  are  being  built  to 
r rovi  .i®  g,  foundation  for  construction  of  machines  tailored  for  optimal 
surrnpt  to  specific  applications. 


Next ,  ni  or  op  recess  or  s  v/ith  user  support  s  of  tv/ are  at  tne  work 
i  o pp  beginning  to  provide  the  integrated  data  view  previously 
riop-  DDMS  software,  and  DENS  overhead.  Work  stations  managing  Li.e 
«n: bee her* a  appropriate  to  their  applications  accessing  data  in  uata  base 
machines  or  over,  computer-based  file  systems  can  be  a  very  adaptable 
n^r^tire  for  information  resource  management,  providing  DBMS 
capabilities  without  DBMS  overhead.  In  fact,  for  many  applications  a 
Cer «-  p- 1  processing  unit  is  not  necessary.  Microprocessor  -buseu  worK 
st o t i cos  support  individuals  who  share  a  common  data,  base  in  a  uata-uase 

v-i p 


TV^  microp  rcssor-based  logic  being  intro* 
.Vor0 ge  devices  enables  deciders  to  be  made 

process,  limiting  the  data  that  must  be 
remcr**,  greatly  reducing  the  processing  required 
that  only  high-level  1  ho;;  I  .rant  it  to  look 
parse.'3  to  the  printer,  again  reducing  prccessin 
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peripherals  enables  r.-.ucn  simpler  software  packages  to  porter- 


**,*'st  micro-based  home  computer  systems  have  consistent  envi  ronnients 

■  ?*  t;  h  ,rp  r>^; 

driv-^  include; 


mart  disks  ar.d 

p  rir.ters . 

For 
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index  or. 

eaci 

i:  floppy 
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commercially  available  disks  for  larger  systems  . a ve  equivalent 


capability.  Users  cf  these  home  computer; 
ccr.piote^oy  and  support  in  their  major  systems. 


demand  one  same 


Third,  the  data  requirements  cf  Knowledge  Management ,  Active  Data 
Base  systems,  Artificial  Intelligence  and  Decision  Ai as  require 
irtc-rf^ces  to  data  to  work  effectively  and  these  interfaces  are 
di  f  f  icv’.lt  to  implement  in  conjunct  ion  with  conventional  DBMS.  Prototype 
?yc ? re  being  constructed  with  LISP  machines  attached  directly  to 
base  machines  without  involving  a  general  computer. 

Finally ,  application  development  environments  are  being  built, 
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mhd  r  ‘rr‘v.ld  u? ve  the  added  advantage  that  data  management  functions 
■  1  ^  ~ v? i lab In  conjunction  with  the  full  processing  power  of  tne 


v  ■?  r . 


’hie  is  important  for  many  forms  of  analysis. 
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processing,  with  consequent  delays  and  programming  complexity  requires 
to  perform  analysis. 

The  computer  language  Ada(13)>  developed  by  the  DCD  for  emoedded 
computer  functions  and  to  be  mandatory  for  mission-critical  software  is 
an  example  of  such  an  application  support  environment.  Ada  is  also  the 
first  operational  language  to  include  real  abstract  data  types.  As 
knowledge  regarding  how  to  build  computer  languages  has  developed,  some 
computer  scientists  have  argued  that  both  operating  systems  and  data 
management  should  be  brought  back  into  the  languages.  Ada  can  be  viewed 
as  the  culmination  of  this  drive. 

A  DBMS,  ADAPLEX,  is  being  developed  for  Ada.  Documentation 
regarding  ADAPLEX  implies  that  Ada  packages  are  being  written  in  Ada  to 
collectively  perform  the  data  management  functions  without  violating  the 
Ada  groundrules.  However,  reading  details  leads  to  the  conclusion  that 
ADAPLEX  will  be  a  conventional  DBHS  written  in  Ada.  This  is 
unfortunate,  particularly  since  ADAPLEX  appears  to  be  building  a  data 
language,  DAPLEX,  in  Ada  for  ADAPLEX.  It  appears  that  Ada  programs  will 
not  be  able  to  access  data  held  in  ADAFLEX  without  using  DAPLEX!  This 
appeals  to  be  a  violation  of  the  spirit  of  Ada. (14) 

In  the  full  Ada  Application  support  environment,  equivalent 
capabilities  can  be  provided  inside  the  Ada  Processing  Support 
Environner.t  v/ithout  accepting  the  limitations  now  imposed  by  DBMS. 
These  data  handling  capabilities  can  be  provided  by  building  a  system  of 
access  'engines'  each  of  which  corresponds  to  a  known  database  system 
feature.  These  access  engines  should  Include  a  contiguous  data  group 
erring,  a  linear-list  engine,  a  network  navigation  engine  ana  a 
first-order  logical  (relational)  engine.  This  will  provide  a  framework 


from  which  customized  engines  can  be  built  to  interface  with  the  current 
OoO  data  files  and  DBMS#  ensuring  that  current  data  can  be  accessed 
from  an  Ada  environment  without  degrading  that  environment 


Each  of  these  engines  should  be  implemented  as  complete  ‘closed/ 
functions  which  might  or  might  not  make  use  of  othe^  engines.  Each 
should  also  be  run  time  support  engines  invoked  by  Ada  programs  as 
merely  different  devices  or  device  types.  There  is  no  inherent  limit  tc 
the  number  and  kind  of  engines  that  an  Ada  program  could  'open'  during  a 
run-time  session.  Application  specific  engines  can  be  constructed  in 
this  framework  such  as  an  engine  for  exploring  geometrical  descriptions 
used  in  cartography#  route  search#  etc  required  by  navigation#  mapping 
and  intelligence  analysis.  As  newer  features  such  as  decision  aids 
become  available#  new  engines  can  be  constructed  to  represent  these  as 
directly  as  possible#  probably  using  existing  engines  for  support. 
Engines  can  be  transitioned  to  hardware  if  desired#  without  disrupting 
the  processing  environment  in  which  they  reside.  This  approach  ensures 
an  open  evolutionary  information  resource  management  capability  inside 
the  Ada  APSE/  whereas  the  traditional  data  model  approach  is  a 
prescription  for  obsolescence. (15) 


THE  BUILDING  CF  THEORY 

In  the  living  of  life#  every  mind  must  face  the  unyielding  rock  of 

reality#  of  a  truth  that  does  not  bend  . To  some  men  this  uull  be  an 

exultant  challenge:  that  so  much  can  be  known  and  truth  not  be 
exhausted#  that  so  much  is  still  to  be  sought. .. To  others  this  is  a 
humiliation  not  to  be  borne#  for  it  marks  out  sharply  the  limits  of  our 
proud  minds. 

St  Thomas  Acquinas#  Summa  Theologica 

The  new  capabilities  available  when  secondary  storage  was  developed 
provided  a  great  increase  in  capability#  coupled  with  enormous 
additional  complexity.  This  complexity  was  dealt  with  intel  1  ec  tual  ly  in- 
many  ways. 

A  DBMS  holds  data  abstracted  from  some  real  situation#  and  is 
therefore  a  model  of  reality.  This  fact  was  recognized  very  early#  as 
was  the  fundamental  problem  of  mapping  abstractions  of  reality  into 
linear  space#  and  the  critical  importance  of  being  able  to  represent  the 
inherent  logical  structure  of  the  data. 

The  CODASYL  committee  developed  it's  model#  which  was  intended  more 
as  a  language  with  which  to  talk  about  all  the  new  capabilities  that  had 
become  possible  than  as  something  to  be  developed  in  it's  total ity. ( 16) 
They  defined  terminology  which  has  baen  generally  accepted#  a  data 
dictionary  to  carry  information  about  data;  and  a  data  definition 
language  and  a  data  management  language  which  are  usually  at  least 
closely  related  and  consistent  in  definition  and  usage#  and  so  on.  They 
defined  the  overall  view  of  the  data  the  enterprise  seeks  to  manage  as  a 
schema#  and  individual  subschema  'now  being  referred  to  as  viewpoints) 
of  the  individual  activities  to  be  supported  by  the  data  base. 

The  CODASYL  committee  intended  to  update  their  model  as  new 
capabilities  become  possible#  maintaining  a  coherent  'state  of  data 
handling'  as  technology  developed.  To  a  large  extent  they  have 
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succeeded  in  this  goal.  The  DBMS  as  we  know  it  can  reasonably  be  called 
the  child  of  CODASYL.  However  the  very  success  in  developing  one 
coherent  framework  lead  to  monolithic  integrated  data  management  systems 
that  became  very  difficult  to  adapt  to  changing  technology. 

Most  of  the  early  permanently  valuable  abstract  theorizing 

regarding  data  handling  from  the  intellectual  ferment  following  the 

introduction  of  random  accessible  secondary  storage  appears  to  have 
originated  in  the  IBM  proprietary  Universal  Information  Systems 
Technology  <UIST)  project#  led  by  Dr  Mike  Senko#  circa  1968-1971.  Dr 
Senko#  simply  said#  'There  are  laws  of  how  data  behaves  in  storage 
media.  We  can  and  should  codify  these  laws.  '  The  Data  Independent 
Accessing  Methodology  (DIAM)  was  developed  by  Dr  Senko  to  provide  a 
framework  in  which  to  report  these  laws.  The  UIST  project  was  dissolved 
in  1971#  and  was  reported  in  a  the  three-part  article  in  the  IBM  Journal 
entitled  'Data  Independent  Accessing  Methodology  (DIAM)'.  (17) 
Unf ortunately#  this  article  uses  obscure  notation  and  is  extremely 
difficult  to  read.  DIAM  breaks  information  systems  into  layers  of 
abstraction#  with  a  formal  model  of  each  layer#  and  a  formal  description 
of  the  r e lat i onsh i p s  between  the  layers.  This  turns  out  to  be  both 
powerful  and  necessary#  and  has  become  the  standard  procedure  for 
discussing  many  different  aspects  of  information  resource  management. 

Simulators  built  upon  the  DIAM  theory  have  led  to  sufficient  fidelity  that 

the  vendors  of  the  system  tested  requested  that  the  results  not  be 
published  fo^  proprietary  reasons. 

DIAM  argues  titat  information  management  systems  are  best  describee 
in  terms  of  levels  of  functionality.  The  enterprise  level  defines  the 
world-view  from  which  the  data  is  abstracted#  and  identifies  semantic 
relationships  between  data.  The  information  level  identifies  what 
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queries  are  possible  and  provides  mechanisms  to  select  from  possible 
queries.  The  Access  Path  level  identifies  uihat  access  paths  to  the  data 
have  been  implemented  and  provides  mechanisms  for  selecting  among 
implemented  paths.  The  implemented  DIAM  simulator  also  allows 
identification  of  access  paths  which  are  imp  1 emen tab  1 e  in  the  DBMS 
structure#  but  which  have  not  yet  been  implemented.  The  encoding  level 
identifies  how  each  access  path  was  or  is  to  be  implemented  and  selects 
from  among  implementation  techniques.  The  address  space  level  defines 
where  each  access  path  is  implemented  and  selects  from  among  logical 
files.  And  finally  the  physical  device  level  identifies  the  device  to 
which  the  path  is  assigned#  selecting  from  among  physical  devices. (18) 

The  enterprise  level  requires  a  model  capable  of  carrying  semantic 
data#  and  some  form  of  entity-set  model  is  optimum.  A  semantic  model  is 
designed  to  capture  and  exploit  the  semantic  similarities  existing 
between  different  data  models.  The  Entity-Category-Relationship  model 
of  data  developed  by  Honeywell  is  an  excellent  basis  for  enterprise 
modeling.  (19)  Building  an  enterprise  level  model  of  the  expanded 
enterprise  facilitates  using  disparate  DBMS  together.  (21)  The 
information  layer  in  DIAM  and  the  access  path  layer  are  each  best 
described  in  terms  of  the  relational  model.  The  relational  model  can  be 
used  without  loss  of  generality  to  develop  highly  restrictive  typing  of 
access  paths#  and  is  the  most  powerful  known  model  for  syntactic 
interface  and  access  path  determination.  The  encoding  level  is 
system-specific  since  implementation  techniques  must  be  selected  in 
terms  of  their  efficiency.  Both  the  address  space  and  physical  device 
level  define  specific  relationsh  ips  in  the  particular  hardware 


Another  approach  to  the  new  capabilities  to  be  achieved  by 
secondary  storage  was  developed  by  Dr  Codd#  also  believed  to  have  been  a 


member  of  the  UIST.  (21)  In  essence  he  said#  'the  new  capabilities  are 
all  very  well/  but  what  matters  is  how  humans  use  the  data  and  instead 
of  optimizing  usage  I'll  optimize  the  human  interface  and  waste  as  much 
storage  and  processing  power  as  necessary.  People  think  in  relations/ 
I'll  build  a  relational  data  base.  '  The  result  in  practice  was  the 
implementation  of  a  single  data  management  system  as  a  series  of  flat 
files#  with  complex  software  to  perform  comparatively  simple  operations 
upon  them.  This  led  to  comparatively  easy  construction  of  queries. 
However/  in  practice  there  is  no  technical  reason  why  the  relational 
model  should  propagate  beyond  the  user  interface/  and  many  reasons  why 
more  efficient  software  is  developed  when  it  dots  not.  He  are  back  to 
the  problem  of  how  to  use  our  secondary  storage. 

Or  Codd  and  Or  Senko  are  known  to  have  been  at  least  in  extreme 
competitive  d i sagr eement.  This  may  well  have  retarded  the  recognition 
that  their  viewpoints  need  not  conflict.  In  retrospect#  much  of  the 
theorizing  regarding  DBMS  during  the  1970s  appears  to  be  extension  of 
the  dialogue  between  Drs  Senko  and  Codd  at  the  beginning  of  the  decade 

Experience  in  using  DBMS  has  established  that  the  usual  DBMS  access 
is  by  application  programs  and  not  by  individuals  with  ad  hoc  queries# 
so  Dr  Codd's  arguments  for  the  relational  model  lack  genera  iity. 
However#  the  wel 1-designed  relational  user  interface  eases  development 
of  application  programs#  as  well  as  facilitate  end-user  queries  In 
view  of  Dr  Codd's  rationale  for  introducing  the  relational  model#  it  is 
ironic  that  one  of  the  strongest  arguments  in  favor  of  it  is  that  a 
relational  user  interface  facilitates  restruc tur ing  a  DBMS  for 
efficiency  of  operation.  It  is  a  tribute  to  the  pervasi veness  of  Dr 
Codd's  arguments  that  the  semantic  models  now  being  developed  a-e 
referred  to  as  post-relat ional 
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The  relational  model  is  based  upon  the  first  order  predicate 
calculus*  which  also  provides  a  sound  mathematical  basis  for  primitive 
data  handling  procedures.  Unfortunately  the  first  orde^  predicate 
calculus  does  not  include  such  operations  as  count  and  sum,  so  at  least 
a  second  order  calculus  needs  to  be  developed.  It  is  a  serious 
theoretical  limitation*  and  one  reason  why  in  practice  mathematics  isn't 
used  in  DBMS  construction  or  usage. 

Post-relational  DBMS  theorizing  usually  uses  extended  set  theory. 
The  relational  modal  is  subsumtd  under  this  theory.  In  practice*  the 
predicate  calculus  appears  to  be  appropriate  to  modeling  the  information 
interface  in  DBMS#  and  should  be  extended  to  cover  the  data  situations 
of  interest. 

FOR  A  FIRM  FOUNDATION 


Mind  alone  can  and  does  di9Cov€r  heretofore  unknown  integral 
pattern  concepts  and  generalized  principals*  apparently  holding  true 
throughout  whole  fields  of  experience.  And  once  discovered  by  mind  the 
concepts  of  the  generalized  principles  become  additional  soecial-case 
experiences  and  are  stored  in  the  brain  bank  and  retrievable  thereafter 
by  the  brain.  But  brains  and  their  externalized  detachedly  operating 
descendants — the  electronic  computer — can  only  search  out  and  program 
the  already  experienced  concepts*  and  mind  alone  can  recognize  and 
capture  the  unknown  and  unexpectedly  existent*  ergo*  unsearchable* 
unwatched-f or — general ized  principles. 

Intuition#  R.  Buckminster  Fuller*  1972 


At  some  point  in  our  development  of  understanding  about  a  subject* 
it  becomes  possible  to  state  ways  to  determine  facts  and  operations  upon 
those  facts#  which  always  lead  to  other  facts.  At  this  point*  the 
operations  are  called  mathematics*  and  the  determmat ion  of  facts  13 
called  applied  mathematics.  The  und er stand ing  about  the  subject  can 
then  be  called  science. 

Professor  whose  name  I  don't  remember* 
circa  1948 


To  provide  a  sound  mathematical  basis  for  complete  data  handling- 
the  set  theory  and  relational  algebra  needs  to  be  extended  to  at  least  a 
second-order  calculus  to  deal  with  such  operations  as  count  and  sum. 
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It  also  requires  extension  to  deal  with  questions  of  completeness 
and  relevance  such  as  "What  data  is  known*  unknown*  not  available*  or 


inconsistent  with  other  data?'  and  'What  data  is  relevant  to  this  query? 
Is  it  more  or  less  relevant  than  other  data?  If  certain  data  is 
relevant  to  a  query*  what  other  data  is  still  needed  to  answer  the  query 
and  how  can  this  need  be  expressed?' 

Relational  algebra  can  be  extended  to  deal  with  these  questions  by 
extending  or  interpreting  the  notion  of  domain  to  give  rigorous 
definition  to  the  concepts  of  unknown*  incomplete*  inconsistent  and  not 
applicable;  adding  operators  to  the  algebra  and  extending  the  domain 
definition  of  existing  ones;  discovering  the  laws  for  the  manipulation 
of  the  extended  algebra  and  developing  algorithms  for  manipulating  the 
algebra*  particularly  for  the  purpose  of  computing  completeness  ana 
relevance.  Such  sucessful  extension  of  relational  algebra  would  lead  to 
the  ability  to  state#  formally*  what  data  is  or  is  not  relevant  to  a 
given  situation  so  that  inferences  can  then  be  made. (22) 

The  immediate  payoff  would  be  in  direct  support  to  information 
analysts  either  using  the  algebra  directly  or  through  application 
programs. 

In  the  longer  term  this  extended  relational  algebra  could  provide  a 
badly  needed  rigorous  mathematical  foundation  for  development  of 
artificial  intelligence  techniques.  Extended  relational  algebra  will 
also  facilitate  interfacing  artificial  intelligence  with  data  held  in 
data  bases  of  any  size  and  c harac ter i 5 t i c s  since  the  information  level 
of  any  data  base  is  well  defined  in  terms  of  relational  algeora. 
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Another  approach  to  interfacing  artificial  intelligence  programs 
with  data  held  in  DBMS  has  been  developed  by  5DC.  <23;  They  have  defined 
a  semantic  model  patterned  after  the  AI  'concept  net'  approach  anc 
demonstrated  isomorphism  between  that  and  a  DBMS  model  they  also 
developed.  They  expect  this  to  lead  to  implementation  of  a  simple  model 
which  will  simultaneously  serve  as  a  knowledge  base  for  an  expert  system 
and  a  schema  for  data  base  app 1 icat ions. 

Artificial  Intelligence!  Decision  Aids  and  Knowledge  Management 
bring  us  full  circle.  They  are  unavoidably  dependent  upon  values, 
whether  or  not  these  values  are  explicitly  stated.  They  will  be  of 
maximum  value  only  when  they  rest  upon  sound  theory.  Until  then,  they 
are  at  best  'ad  hockery.  '(24) 

One  of  the  most  difficult  aspects  of  theory  to  discuss  is  data 
representation.  A  data  base  represents  some  aspects  of  reality  in 
symbols  that  can  be  held  in  computers.  Some  representation  problems  are 
long  salved  and  readily  agreed  upon;  time  is  divided  into  days,  hours, 
minutes,  etc,  and  the  only  decision  is  which  of  a  small  number  of 
representat ions  shall  be  chosen  for  a  specific  system.  Sometimes  the 
repr esentat ion  chosen  interacts  seriously  with  new  uses  for  the  data 
For  example,  the  decision  to  manage  data  in  terms  of  fixed  ins ta 1 lat i ons 
makes  dealing  with  mobile  objects  almost  intractable.  We  don't  know  how 
to  deal  with  multiple  repr esentations  of  the  same  object.  Attempts  to 
address  this  problem  lead  to  fundamental  ph i losoph i cal  problems,  and 
philosophers  have  not  yet  come  to  terms  with  the  need  to  solve  very 
mundane  examples  of  their  very  abstract  theorizing. 

As  previously  noted,  we  need  the  capability  to  identify  and  track 
more  than  one  concept  at  a  time.  We  need  to  recognize  that  'it  is 
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always  in  support  of  more  than  one  concept.  U)e  also  urgently  need 
theory  that  we  trust  enough  so  that  when  concepts  are  proven  incorrect 
we  accept  the  proof.  Then  we  need  methodology  to  deal  with  the 
disproven  concepts.  Sometimes  we  will  be  able  to  use  the  proof  that  our 
concepts  are  incorrect  to  develop  better  concepts.  Sometimes  we  will 
only  know  that  what  we  thought  was  correct  is  wrong)  without  enough 
knowledge  to  identify  what  is  right.  This  is  the  most  difficult  of  all 
intellectual  situations.  Ule  have  to  be  able  to  accept  the  fact  that  we 
don't  know.  We  must  also  find  a  way  to  describe  what  is  happening  well 
enough  to  bring  brains  to  bear  on  the  newly  identified  missing  concepts. 
The  methodology  we  require  must  be  a  validated  theory  about  knowledge. 
It  must  also  be  r epr esen tab  1 e  in  digital  processors  if  it  is  to  be  a 
viable  foundation  for  information  resource  management. 

One  promising  approach  to  developing  this  methodology  is  the 
state-of -af f air s  descriptive  technology  developed  by  Dr  Peter  Ossorrio 
at  the  University  of  Colorado  at  Boulder  under  the  rubric  Descriptive 
Psychology. 

State  of  Affairs  Technology  <25)(26)(27)  maintains  that  information 
analysis  is  at  once  both  completely  principled  yet  completely 
contex t-dependent.  To  deal  effectively  with  these  a  number  of 
techniques  have  been  developed.  These  techniques  include  Paradigm  Case 
Formulation  as  opposed  to  'def inition '  (28);  Judgement-Space  as  opposed 
to  '  identi f ication  '  (29) ;  State  of  Affairs  Systems  as  opposed  to 
*  frames ' (30) >  Intentional  Action  Systems  as  opposed  to  'scr ip ts 7 (31 >? 
and  Ex  Post  Facto  Formulation  as  opposed  to  'time  *32)  These 
techniques  have  been  successfully  applied  to  solving  problems  in  expert 
systems  <33>>  linguistic  data  processing  (34)/  and  automatic  fact 
analysis(25)  .  The  evidence  of  these  successes  suggest  that  the 
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state-of-aff a ir s  technology  is  a  fruitful  basis  for  research  in 
representation*  and  the  development  and  validation  of  concepts. 

Until  me  have  agreed-upon  formalisms  by  which  we  can  manage  our 
conceptual  models  of  reality/  capable  of  representing  multiple 
world-views  we  do  not  have  valid  theory.  Until  then*  our  application 
programs/  decision  aids/  knowledge  management  and  artificial  mtellignce 
programs  can  only  be  validated  in  experiential  terms.  We  will  have  no 
reliable  procedures  to  be  sure  they  apply  in  new  situations-  and  new 
situations  are  precisely  where  they  will  be  most  needed. 
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MISSION 

of 

Rome  Air  Development  Center 

RADC  plant  and  execute*  research,  development,  te*t  and 
*e£ected  acquisition  p/iog/tam*  -in  *uppo*t  o^  Command,  Control 
Conmunicationt  and  Intelligence  (C3I)  activ-itie*.  Technical 
and  engineering  Support  uuthin  ax  eat  of  technical  competence 
n  provided  to  ESP  Program  Office*  (Pd*)  and  other  ESD 
element*.  The  principal  technical  mittion  area  are 
communication*,  electromagnetic  guidance  and  control,  tur- 
veillance  oi  ground  and  aerotpace  object *,  intelligence  data 
collection  and  handling,  information  ty&tem  technology, 
ionospheric  propagation,  tolid  itate  *a ienceA,  microwave 
phytic*  and  electronic  reliability,  maintainability  and 
compatibility. 


